-
Notifications
You must be signed in to change notification settings - Fork 20.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p/rlpx: reduce allocation and syscalls #22899
Conversation
I'm also still testing this, and it seems the snappy buffer management doesn't fully work yet. |
This is the minimal necessary change to make the next commit work.
This change significantly improves the performance of RLPx message reads and writes. In the previous implementation, reading and writing of message frames performed multiple reads and writes on the underlying network connection, and allocated a new []byte buffer for every read. In the new implementation, reads and writes re-use buffers, and perform much fewer system calls on the underlying connection. This effectively doubles the theoretically achievable throughput on a single connection, as shown by the benchmark result: name old speed new speed delta Throughput-8 66.5MB/s ± 0% 139.7MB/s ± 1% +110.16% (p=0.001 n=7+7) The change also removes support for the legacy, pre-EIP-8 handshake encoding. As of May 2021, no actively maintained client sends this format.
The snappy functions only check the length of the destination buffer and ignore its capacity. When message sizes alternated between large and small, we would still allocate a new buffer for every other message. This fixes it.
c.snappyWriteBuffer = growslice(c.snappyWriteBuffer, snappy.MaxEncodedLen(len(data))) | ||
data = snappy.Encode(c.snappyWriteBuffer, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be good to truncate it again before this method returns? as it is, it will grow to the "largest packet you ever sent" and just stick around forever..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended. The idea is to keep allocating the buffer until it is large enough for the largest message we will ever send. RLPx messages are limited to 16MB, so it won't grow much larger than that.
The
Also, unrolling the loop in that new
Hacks: diff --git a/p2p/rlpx/rlpx.go b/p2p/rlpx/rlpx.go
index 72ba28d2ca..9132129ea9 100644
--- a/p2p/rlpx/rlpx.go
+++ b/p2p/rlpx/rlpx.go
@@ -272,7 +272,6 @@ func (m *hashMAC) compute(seed []byte) []byte {
if len(seed) != len(m.aesBuffer) {
panic("invalid MAC seed")
}
-
sum1 := m.hash.Sum(m.hashBuffer[:0])
m.cipher.Encrypt(m.aesBuffer[:], sum1)
for i := range m.aesBuffer {
@@ -283,11 +282,42 @@ func (m *hashMAC) compute(seed []byte) []byte {
return sum2[:16]
}
+func (m *hashMAC) compute2(seed []byte) []byte {
+ if len(seed) != len(m.aesBuffer) {
+ panic("invalid MAC seed")
+ }
+ copy(m.hashBuffer[:0], seed)
+ sum1 := m.hashBuffer[0:16]
+ m.cipher.Encrypt(m.aesBuffer[:], sum1)
+
+ m.aesBuffer[0] ^= seed[0]
+ m.aesBuffer[1] ^= seed[1]
+ m.aesBuffer[2] ^= seed[2]
+ m.aesBuffer[3] ^= seed[3]
+ m.aesBuffer[4] ^= seed[4]
+ m.aesBuffer[5] ^= seed[5]
+ m.aesBuffer[6] ^= seed[6]
+ m.aesBuffer[7] ^= seed[7]
+ m.aesBuffer[8] ^= seed[8]
+ m.aesBuffer[9] ^= seed[9]
+ m.aesBuffer[10] ^= seed[10]
+ m.aesBuffer[11] ^= seed[11]
+ m.aesBuffer[12] ^= seed[12]
+ m.aesBuffer[13] ^= seed[13]
+ m.aesBuffer[14] ^= seed[14]
+ m.aesBuffer[15] ^= seed[15]
+
+ m.hash.Write(m.aesBuffer[:])
+
+ sum2 := m.hash.Sum(m.hashBuffer[:0])
+ return sum2[:16]
+}
+
// computeFrame computes the MAC of framedata.
func (m *hashMAC) computeFrame(framedata []byte) []byte {
m.hash.Write(framedata)
seed := m.hash.Sum(m.seedBuffer[:0])
- return m.compute(seed[:16])
+ return m.compute2(seed[:16])
}
// Handshake performs the handshake. This must be called before any data is written |
Current status: 50% is sha3:
20% syscalls:
20% AES:
Remaining 10% is mainly test-overhead:
|
Another potential tiny thing -- this isn't covered by any existing benchmark though, so hard to evaluate: diff --git a/p2p/rlpx/rlpx.go b/p2p/rlpx/rlpx.go
index 326c7c4941..99fd69a452 100644
--- a/p2p/rlpx/rlpx.go
+++ b/p2p/rlpx/rlpx.go
@@ -34,6 +34,7 @@ import (
"net"
"time"
+ "github.com/ethereum/go-ethereum/common/bitutil"
"github.com/ethereum/go-ethereum/crypto"
"github.com/ethereum/go-ethereum/crypto/ecies"
"github.com/ethereum/go-ethereum/rlp"
@@ -486,11 +487,14 @@ func (h *handshakeState) secrets(auth, authResp []byte) (Secrets, error) {
}
// setup sha3 instances for the MACs
+ xorSum := make([]byte, len(s.MAC))
mac1 := sha3.NewLegacyKeccak256()
- mac1.Write(xor(s.MAC, h.respNonce))
+ bitutil.XORBytes(xorSum, s.MAC, h.respNonce)
+ mac1.Write(xorSum)
mac1.Write(auth)
mac2 := sha3.NewLegacyKeccak256()
- mac2.Write(xor(s.MAC, h.initNonce))
+ bitutil.XORBytes(xorSum, s.MAC, h.initNonce)
+ mac2.Write(xorSum)
mac2.Write(authResp)
if h.initiator {
s.EgressMAC, s.IngressMAC = mac1, mac2 |
This change significantly improves the performance of RLPx message reads and writes. In the previous implementation, reading and writing of message frames performed multiple reads and writes on the underlying network connection, and allocated a new []byte buffer for every read. In the new implementation, reads and writes re-use buffers, and perform much fewer system calls on the underlying connection. This doubles the theoretically achievable throughput on a single connection, as shown by the benchmark result: name old speed new speed delta Throughput-8 70.3MB/s ± 0% 155.4MB/s ± 0% +121.11% (p=0.000 n=9+8) The change also removes support for the legacy, pre-EIP-8 handshake encoding. As of May 2021, no actively maintained client sends this format.
This change significantly improves the performance of RLPx message reads
and writes. In the previous implementation, reading and writing of
message frames performed multiple reads and writes on the underlying
network connection, and allocated a new []byte buffer for every read.
In the new implementation, reads and writes re-use buffers, and perform
much fewer system calls on the underlying connection. This doubles the
theoretically achievable throughput on a single connection, as shown by
the benchmark result:
The change also removes support for the legacy, pre-EIP-8 handshake encoding.
As of May 2021, no actively maintained client sends this format.